A Simple Approach to Unknown Word Processing in Japanese Morphological Analysis
نویسندگان
چکیده
This paper presents a simple but effective approach to unknown word processing in Japanese morphological analysis, which handles 1) unknown words that are derived from words in a pre-defined lexicon and 2) unknown onomatopoeias. Our approach leverages derivation rules and onomatopoeia patterns, and correctly recognizes certain types of unknown words. Experiments revealed that our approach recognized about 4,500 unknown words in 100,000Web sentences with only 80 harmful side effects and a 6% loss in speed.
منابع مشابه
Corpus-based Japanese morphological analysis
The goal of this study is to improve corpus-based Japanese morphological analysis which is composed by word segmentation and part-of-speech (below POS) tagging. We divide the problem of Japanese morphological analysis into three subproblems: models for known word, models for unknown word and corpus maintenance schema. Firstly, we discuss Markov model-based approaches for known word processing. ...
متن کاملAutomatic Semantic Sequence Extraction from Unrestricted Non-Tagged Texts
Mophological processing, syntactic parsing and other useflfl tools have been proposed in the field of natural language processing(NLP). Many of those NLP tools take dictionary-based approaches. Thus these tools are often not very efficient with texts written in casual wordings or texts which contain m a w domain-specific terms, because of the lack of vocabulary. In this paper we propose a simpl...
متن کاملThe Unknown Word Problem: a Morphological Analysis of Japanese Using Maximum Entropy Aided by a Dictionary
In this paper we describe a morphological analysis method based on a maximum entropy model. This method uses a model that can not only consult a dictionary with a large amount of lexical information but can also identify unknown words by learning certain characteristics. The model has the potential to overcome the unknown word problem.
متن کاملChart-driven Connectionist Categorial Parsing of Spoken Korean
While most of the speech and natural language systems which were developed for English and other Indo-European languages neglect the morphological processing and integrate speech and natural language at the word level, for the agglu-tinative languages such as Korean and Japanese, the morphological processing plays a major role in the language processing since these languages have very complex m...
متن کاملIranian EFL Learners' Processing of English Derived Words
An interesting area of psycholinguistic inquiry is to discover the way morphological structures are stored in the human mind and how they are retrieved during comprehension or production of language. The current study probed into what goes on in the mind of EFL learners when processing derivational morphology and how English and Persian derivational suffixes are processed. 60 Iranian EFL learne...
متن کامل